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Abstract: We use a newly released version of the SuperBayeS code to analyze the impact 
of the choice of priors and the influence of various constraints on the statistical conclusions 
for the preferred values of the parameters of the Constrained MSSM. We assess the effect 
in a Bayesian framework and compare it with an alternative likelihood-based measure of 
a profile likelihood. We employ a new scanning algorithm (MultiNest) which increases the 
computational efficiency by a factor ~ 200 with respect to previously used techniques. We 
demonstrate that the currently available data are not yet sufficiently constraining to allow 
one to determine the preferred values of CMSSM parameters in a way that is completely 
independent of the choice of priors and statistical measures. While BR{B Xg'j) generally 
favors large mo, this is in some contrast with the preference for low values of tuq and mi/2 
that is almost entirely a consequence of a combination of prior effects and a single constraint 
coming from the anomalous magnetic moment of the muon, which remains somewhat 
controversial. Using an information-theoretical measure, we find that the cosmological dark 
matter abundance determination provides at least 80% of the total constraining power of all 
available observables. Despite the remaining uncertainties, prospects for direct detection 
in the CMSSM remain excellent, with the spin-independent neutralino-proton cross section 
almost guaranteed above 0"^ ^ 10~^° pb, independently of the choice of priors or statistics. 
Likewise, gluino and lightest Higgs discovery at the LHC remain highly encouraging. While 
in this work we have used the CMSSM as particle physics model, our formalism and 
scanning technique can be readily applied to a wider class of models with several free 
parameters. 
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1. Introduction 

Experiments at the Large Hadron Collider (LHC) will soon start testing many frameworks 
of particle physics beyond the Standard Model (SM). Particular attention will be given to 
the Minimal Supersymmetric SM (MSSM) and other effective low-energy models involving 
softly-broken supersymmetry (SUSY) which remain by far the most theoretically developed 
and popular schemes. On another front, dark matter (DM) experiments have by now 
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reached the level of sensitivity that would allow them to detect a signal from DM if it is 
made up of the lightest neutralino, whose abundance as cold dark matter (CDM) is now very 
well constrained thanks to WMAP and other cosmic microwave background observations. 
With enough effort, Tevatron experiments may be able to improve the final LEP limit on 
the SM-like Higgs boson, and perhaps even detect it. Heavy quark experiments continue 
improving constraints on allowed contributions from "new physics" (be it SUSY or some 
other framework) to several observables related to flavor. Finally, an apparent discrepancy, 
at the level of about 3a, between experiment and SM predictions (based on e~^e~ data) for 
the anomalous magnetic moment of the muon, has now persisted for several years. 

In light of the expected vast improvement in the constraining power of data from 
the LHC and DM searches, it is essential to develop a solid formalism to allow one to 
fully explore properties of popular low-energy SUSY and other models, and to reliably 
derive ensuing experimental implications. Until a few years ago, a somewhat oversimplified 
approach based on fixed-grid scans of subsets of parameter space was sufficient. Such scans 
imposed observational constraints on the grid in a rigid "in-or-out" fashion (e.g., points 
outside some arbitrary 1 or 2a experimental range of a given observables were discarded), 
without paying attention to the varying degree with which points could reproduce the 
data. The points on the grid surviving all the constraints were then used to qualitatively 
evaluate the impact of thus applied data and ensuing predictions for various observables. 
A major drawback of the approach was, however, that it did not allow for a probabilistic 
interpretation of results. A step in the right direction was to employ a chi-square analysis 
where, for example, the question of more properly weighting experimental errors could be 
addressed |jl], |2|, ^. However, the approach remains of limited use as it does not allow one 
to perform a full scan over all relevant parameters. A major improvement in this direction 
has been provided by employing a Markov Chain Monte Carlo (MCMC) algorithm Q, 
linked with Bayesian statistics ^. 

Bayesian methods coupled with MCMC technology are superior in many respects 
to traditional, frequentist grid scans of the parameter space. (For an introduction, see, 
e.g., 0, §.) For a start, they are much more efficient, in that the computational effort 
required to explore a parameter space of dimension scales roughly proportionally with 
A^. In contrast, on a grid scan with k points per dimension, the number of likelihood 
evaluations required goes as , hence this approach becomes computationally prohibitive 
even for parameter space of moderate dimensionality. Secondly, the Bayesian approach 
allows one to easily incorporate into the final inference all relevant sources of uncertainty. 
For a given SUSY model one can include relevant SM (nuisance) parameters and their 
associated experimental errors, with the uncertainties automatically propagated to give 
the final uncertainty on the SUSY parameters of interest. In addition, theoretical uncer- 
tainties can be easily included in the likelihood (see [^). Thirdly, another key advantage 
is the possibility to marginalize (i.e., integrate over) additional ("hidden") dimensions in 
the parameter space of interest with very little computational effort. By "hidden dimen- 
sions" we mean here the parameters others than the ones being plotted, for example in 1 
dimensional or 2 dimensional plots. In this paper, we upgrade our scanning technique to a 
much more efficient algorithm called "MultiNest" which reduces very significantly the 
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computational burden of a full exploration of the parameter space. 

These advantages are built into the Bayesian procedure. The latter also requires the 
specification of a prior probability distribution function (or simply prior), describing our 
state of knowledge about the problem before we see the data. One of the main aims of this 
study is to assess the influence of prior choice on the statistical conclusions on CMSSM 
parameters. A number of recent studies have investigated the impact of several choices 
of priors on the parameter inference @, |, |, |lO|, |ll], |l9[ in the 



context of the Constrained Minimal Supersymmetric Standard Model (CMSSM) [^0|, and 
found it to be rather strong. The CMSSM, because of its relative simplicity, is a model of 
much interest. 

The goal of the paper is twofold. On one side we address the question of the origin 
of the strong prior dependence. First, we point out and examine the impact on SUSY 
parameter inference from the highly non-linear nature of the mapping from the CMSSM 
parameters to the observable quantities. Next, we adopt two different priors (flat on a linear 
scale and flat on a log scale, see below). Within each we explore in detail, and compare, the 
impact of several observables which have been known to play a major role in constraining 
the CMSSM parameter space, including LEP bounds on Higgs properties, BR{B Xsj), 
the relic abundance ^x^'^ lightest neutralino assumed to constitute most of CDM 

in the Universe, and the anomalous magnetic moment of the muon {g — 2)^. It is the last 
observable that we find to play a singular role in favoring lower values of superparners, in 
some tension with some other observables, especially BR{B X^^) which favors larger 



scalar masses 12 . 



The other major aim of our paper is to compare the Bayesian posterior probability 
distribution with the statistical measure of a profile likelihood in the context of prior 
dependence. We conclude that the profile likelihood may provide a more robust assessment 
of the favored regions of CMSSM parameters with respect to volume effects generated by the 
prior choice. The coverage properties of this measure will be studied elsewhere. We focus 
here on the CMSSM which we treat as a case study. The problem of prior dependence is 
likely to be even more severe for more complicated SUSY models given present constraints, 
although better data such as, e.g., sparticle and Higgs detection at LHC are expected to 
cure it. 

The paper is organized as follows. In section |2| we review the statistical formalism 
used in this work. In section ^ we focus on the CMSSM and introduce our experimental 
constraints, before exploring in section Q the impact of priors and observables on inferences 
on the SUSY parameter space. In section ^ we examine in more detail the consistency of 
the various observational constraints and focus in particular on the tension between (g — 2)^ 
and BR{B Xsj). We also quantify the information content (i.e., the constraining power) 
of each observable. Implications of parameter inferences on gluino and light Higgs searches 
at the LHC and on direct detection searches of DM are outlined in section ^, and our 
conclusions are presented in section 0. In Appendix ^ we give a brief description of the 
MultiNest algorithm. 
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2. Statistical formalism 



2.1 Statistical framework 

Let us denote a set of parameters of a model under consideration by 9, and by all other 
relevant (so-called "nuisance parameters"). Both sets form our "basis parameters" 

m={e,^). (2.1) 

The cornerstone of Bayesian inference is provided by Bayes' theorem, which reads 



The quantity p{m\d) on the l.h.s. of eq. ( |2.2| ) is called a posterior probability density 
function (posterior pdf, or simply a posterior). On the r.h.s., the quantity p{d\^), taken 
as a function of ^ for fixed data d, is called the likelihood (where the dependence (,{m) is 
understood). The likelihood supplies the information provided by the data. In the case 
of the CMSSM which we will consider below, it is constructed in Sec. 3.1 of ref. [Q. The 
quantity 7r(m) denotes a prior probability density function (prior pdf, or simply a prior) 
which encodes our state of knowledge about the values of the parameters in m before we see 
the data. The prior state of knowledge is then updated to the posterior via the likelihood. 
Much care must be exercised in assessing the impact of priors on the final inference on the 
model's properties. If the posterior strongly depends on the choice of priors, then this is a 
signal that the available data is not sufficiently constraining to override the prior, and hence 
the information content of the posterior is strongly influenced by the choice of the prior. 
Therefore judgement must be suspended until more constraining data becomes available, 
unless there is a physically strong motivation for a specific choice of priors. (For example, 
in some simple situations the prior follows from considerations of the invariance properties 
of the problem.) 

Finally, the quantity in the denominator is called evidence or model likelihood. If one is 
interested in constraining the model's parameters, the evidence is merely a normalization 
constant, independent of m, and can therefore be dropped. However, the evidence is very 
useful in the context of Bayesian model comparison (see e.g. [^]) but in this work we 
will use it instead to quantify the constraining power of each observable. The evidence 
is a multi-dimensional integral over the model's parameter space m (including nuisance 
parameters),"*^ 

p{d) = / p{d\(,)TT{m)dm. (2.3) 



^More precisely, one should write for the evidence p(d|model), in order to show explicitly that it is 
conditional on the assumption that the model is the true theory. From there one can further employ Bayes' 
theorem to obtain the posterior probability for the model's parameters given the observed data, namely 
p(model|d). This is the subject of Bayesian model comparison (see e.g. for an illustration). Here we 
do not employ the evidence for this purpose (see instead |l6j for applications to the CMSSM), and 
therefore drop the explicit conditioning on the model under study, although in the following one should 
always interpret p(d) =p(d|model). 
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In our previous work [P, 11, |T3|, 14|, we employed an MCMC algorithm to map out 



the posterior pdf via eq. (2^). As extensively described in the purpose of the MCMC 
algorithm is to construct a sequence of points in parameter space (called "a chain") , whose 
density is proportional to the posterior pdf. The sequence of points thus obtained gives 
a series of samples from the posterior, which are weighted in such a way as to reflect the 
relative probability of the various regions in parameter space. 

In this work we upgrade our scanning technique to use a novel algorithm, MultiNest |^ , 
which is based on the framework of Nested Sampling, recently invented by Skilling p2| . 
MultiNest has been developed in such a way as to be an extremely efficient sampler even 
for likelihood functions defined over a parameter space of large dimensionality with a 
very complex structure. This aspect is very important for multi-parameter models. For 
example, previous MCMC scans have revealed that the 8-dimensional likelihood surface 
of the CMSSM can be very fragmented, and that it features many finely tuned regions 
that are difficult to explore with conventional MCMC and grid scans. Therefore we adopt 
MultiNest as an efficient sampler of the posterior. We have compared the results with our 
MCMC algorithm and found that they are identical (up to numerical noise). The main 
motivation is the increased sampling efficiency (which improves computational efficiency 
by a factor of ~ 200 with respect to our previous MCMC algorithm) and the possibility 
of computing automatically the Bayesian evidence, which we use in this work to quantify 
the amount of information in the various observables.^ We give a brief description of the 
MultiNest algorithm in Appendix |^. 

2.2 Statistical measures 

Once a sequence of M samples drawn from the posterior, m^*) (t = 0,1, . . . , M—1), becomes 
available, it becomes a trivial task to obtain Monte Carlo estimates of expectations for any 
function of the parameters. For example, the posterior mean is given by 

1 ^^"^ 

(m) = / p{m\d)mdm ^ — m^^\ (2.4) 

t=o 

where (•) denotes the expectation value with respect to the posterior and the equality with 
the mean of the samples follows because the samples m*-*-* are generated from the posterior 
by construction. In general, one can easily obtain the expectation value of any function of 
the parameters /(m) as 

M-l 

(/M) ^ 77 E /("^^*^)- (2.5) 

It is usually interesting to summarize the results of the inference by giving the 1-dimensional 
marginal probability for m,-, the j-th element of m. Taking without loss of generality j = 1 



new version of our code, including MultiNest and a new interactive plotting routine (called 
SuperEGO), is publicly available from www.superbayes.org. The full lists of samples used in 
this work are also available at the same location. An online plotting tool is available at 
http: //pisrvO .pit .physik .uni-tuebingen.de/darkmatter/superbayes/index .php. 
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and a parameter space of dimensionality N, the marginal posterior for parameter mi is 
given by 

p{mi\d) = p{m\d)dm2 ■ ■ ■ dniN . (2-6) 



From the samples it is trivial to obtain the marginal posterior on the l.h.s. of eq. 
since the samples are drawn from the full posterior, p{m\d), their density reflects the value 
of the full posterior pdf. It is then sufficient to divide the range of mi into a series of bins 
and count the number of samples falling within each bin, simply ignoring the coordinates 
values m2, . . . ,mjv. A 2-dimensional posterior is defined in an analogous fashion. A ID 
2-tail a% credible region is given by the interval (for the parameter of interest) within 
which fall a% of the samples, obtained in such a way that a fraction (1 — a)/2 of the 
samples lie outside the interval on either side. In the case of a 1-tail upper (lower) limit, 
we report the value of the quantity below (above) which a% of the sample are to be found. 

An alternative statistical measure to the marginal posterior given by ( |2.6D is the profile 
likelihood, defined, say, for the parameter mi as 

£(mi) = max C{d\m), (2-7) 

where in our case C{d\m) is the full likelihood function. Thus in the profile likelihood one 
maximises the value of the likelihood along the hidden dimensions, rather than integrating 
it out as in the marginal posterior. The profile likelihood is obtained from the samples by 
maximising the value of the likelihood in each bin, and it has been recently investigated 



in the context of MCMC scans of the CMSSM in |18|. The advantage is that the profile 
likelihood is clearly independent of the prior. However, its numerical evaluation in a high- 
dimensional parameter space is in general very difficult, especially when finely tuned regions 
are present where the likelihood is large but whose volume is very small (for a given metric). 
For example, a log prior on the SUSY masses will expand the volume of the low-mass 
parameter region and as a consequence the algorithm will explore it in much finer detail 
than it would be possible with a linear prior on the masses. This might find points in 
parameter space that are good fits to the data and that would have otherwise been missed 
by a scan performed using a linear prior. This will be true of any scanning algorithm: 
scanning in one metric (in our language, for a given prior) might in general give a different 
value than the numerical evaluation of the same quantity when scanning in another metric. 
To the extent the different numerical evaluations of the same quantity disagree, one must 
of course take with a grain of salt either value. ^ As we shall demonstrate below, the choice 
of priors infiuences the numerical efficiency with which different regions of parameter space 
are scanned. Therefore the numerical evaluation of the profile likelihood might in general 
be different for different prior (i.e., metric) choices. In the following, when we refer to the 
profile likelihood in connection with the scanning results, we always mean "our numerical 
evaluation of the profile likelihood" . 



^Notice that this is fundamentally different from the Bayesian perspective: a change of prior changes 
the posterior in Bayesian statistics, hence the mathematical function one wants to map out changes inde- 
pendently on the numerical aspects of the scanning technique. 
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The profile likelihood can be directly interpreted as a likelihood function, except of 
course that it does account for the effect of the hidden parameters. Therefore one can think 
of plots of the profile likelihood as analogous to what would be obtained by performing a 
more traditional fixed-grid scan in 8-dimensions, computing the chi-square at each point 
at then plotting the value maximised along the hidden dimensions. We report confidence 
intervals from the profile likelihood obtained via the usual likelihood ratio test as follows. 
Starting from the best-fit value in parameter space, an a% confidence interval encloses all 
parameter values for which the log-likelihood increases less than Ax^(a,n) from the best 
fit value. The threshold value depends on a and on the number n of parameters one is 
simultaneously considering (usually n = 1 or n = 2), and it is obtained by solving 

a= / xlioo)dx, (2.8) 
Jo 

where Xni^) is the chi-square distribution for n degrees of freedom. The MultiNest al- 
gorithm we employ is much more efficient than a standard grid scan in parameter space, 
and it allows one to explore the full multi-dimensional parameter space at once. There- 
fore our scanning algorithm when coupled with the profile likelihood can be understood as 
an extremely efficient shortcut for the evaluation of the minimum chi-square in a multi- 
dimensional parameter space. However, the MultiNest technique (or indeed, any other 
Bayesian procedure) is not particularly optimized to look for isolated points with large 
likelihood in the parameter space. This means that the profile likelihood is derived from 
a necessarily sparse sampling of our 8-dimensional parameter space, and it might well be 
that regions with large likelihood that occupy a very small volume in parameter space are 
missed altogether. This means that an analogous problem would appear if the scan was 
done with a traditional grid technique, which would find multiple maxima in the likelihood 
if executed in 8-dimensional parameter space (grid scans to date have never been able to 
deal with sufficient resolution with such a high dimensional parameter space). Neverthe- 
less, Bayesian technology and the MultiNest algorithm give several orders of magnitude 
improvement in the efficiency of the scan, thereby allowing for the first time to undertake 
a detailed analysis of the impact of the data when applied one by one or simultaneously to 
the whole parameter space. 

As an alternative measure to the posterior, in our previous work we employed a quan- 
tity that we called the mean quality of fit (see eq. (3.1) in ^^), which is defined as the 
average (over the posterior) of the chi-square. Therefore the difference between the profile 
likelihood and the mean quality of fit is that in the mean quality of fit the chi-square is aver- 
aged over the hidden dimensions, while in the profile likelihood it is maximised. Numerical 
investigation shows that the two quantities are very similar in the case of the CMSSM. We 
have chosen to adopt in this work the profile likelihood because of its more straightforward 
statistical interpretation, but we point out that our previous findings showing the mean 
quality of fit are very similar to what one would have obtained using the profile likelihood 
instead. 

In Bayesian statistics, the posterior pdf encodes the full information coming from 
the data and the prior. Ideally, the information in the data is much stronger than the 
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information in the prior, so effectively the posterior should be dominated by the likelihood 
function and the prior choice ought to be irrelevant (see fig. 2 in Q for an illustration). 
Furthermore, in this case it is easy to show that the Bayesian posterior, the profile likelihood 
and the mean quality of fit all become identical, and therefore the conclusions from the 
different statistical measures agree (and are uncontroversial) . If the data are not strong 
enough, the different statistical quantities encode different pieces of information about the 
parameters and may in general disagree, and the prior influence might come to dominate the 
result. This appears to be the case with the CMSSM with currently available constraints. 
One of the main aims of this work is to clarify the reasons for this prior and statistical 
measure dependence, and to assess how much one should be worried about it. 

2.3 Information content and constraining power 

The Bayesian evidence returned by the MultiNest algorithm can be employed in several 
ways, mainly as a tool for model comparison (see, e.g. Q). Here we employ it to quantify 
the amount of information (i.e., the constraining power) of the different observables. This 
is encoded in the Kullhack-Leihler (KL) divergence between the prior and the posterior p3|| . 
For ease of notation, let us denote the posterior pdf by p and the prior by vr, as before. 
Then the KL divergence is defined as 



^KL(p,7r) 



p{m\d) In , ^ } dm. 



TTimj 



(2.9) 



In virtue of Bayes' theorem the KL divergence becomes the sum of the negative log evidence 
and the expectation value of the log-likelihood under the posterior: 



-Dkl(P5 tt) = — lnp((i) + / p{m\d)lnC{m)dm = —In p{d) — {x^ /2). 



(2.10) 



The first quantity on the r.h.s. is returned by the MultiNest algorithm, while computing 
the expectation value of the log-likelihood (i.e., the chi-square) is trivial from the samples. 
It is sufficient to average the chi-square over the samples. 

To gain a feeling for what the KL divergence expresses, let us compute it for a 1- 
dimensional case, with a Gaussian prior around of variance and a Gaussian likelihood 
centered on rrimax and variance a^. We obtain after a short calculation 



In h - 

S 2 



rrir, 



1 



(2.11) 



The second term on the r.h.s. gives the reduction in parameter space volume in going 
from the prior to the posterior. For informative data, a/T, <C 1, this terms is positive and 
grows as the logarithm of the volume ratio. On the other hand, in the same regime the 
third term is small unless the maximum likelihood estimate is many standard deviations 
away from what we expected under the prior, i.e. for mmax/o" ^ 1- This means that 
the maximum likelihood value is "surprising", in that it is far from what our prior led us 
to expect. Therefore we can see that the KL divergence is a summary of the amount of 
information, or "surprise", contained in the data. 



-8- 



Other quantities can be used to assess the constraining power of the data (see e.g. |15| 
for a recent apphcation), but the KL divergence has the advantage of being firmly grounded 
in information theory and of having a clear interpretation. 



3. Implications for the Constrained MSSM 

As a theoretical particle physics framework to illustrate our procedure we use the popular 
Constrained MSSM [^]. Some of us have examined the model in the context of Bayesian 
statistics before 0, 16, 11, 12]. Here we summarize its relevant features here for complete- 



ness. Below we also list, and update, where applicable, the experimental constraints on 
the model. 

3.1 The Constrained MSSM 

In the CMSSM the parameters '^o and Aq, which are specified at the GUT scale 

-^GUT — 2 X 10^^ GeV, serve as boundary conditions for evolving, for a fixed value of 
tan P, the MSSM Renormalization Group Equations (RGEs) down to a low energy scale 
MsusY = ^m^^ m^^ (where mj^ denote the masses of the scalar partners of the top quark) , 
chosen so as to minimize higher order loop corrections. At MsusY the (1-loop corrected) 
conditions of electroweak symmetry breaking (EWSB) are imposed and the SUSY spectrum 
is computed. 

Our aim is to use experimental constraints on observational quantities defined in terms 
of CMSSM parameters to infer the most probable values of the CMSSM quantities them- 
selves (and the associated errors). In this paper with fix the sign of fi to be positive, 
in order for the model to acommodate the apparent discrepancy of the anomalous mag- 
netic moment of the muon between experiment and SM predictions. We then denote the 
remaining four free CMSSM parameters by the set 

e = {mo, mi/2,Ao, tan f3). (3.1) 

As originally demonstrated in |5|, ^] , the values of the relevant SM parameters can strongly 
influence some of the CMSSM predictions, and, in contrast to common practice, should 
not be simply kept fixed at their central values. We thus introduce a set tp of so-called 
"nuisance parameters" of the SM parameters which are relevant to our analysis, 

= {Mt,mt,{m,)^, a,m{MzY^, as{Mz)^), (3.2) 

where Mf is the pole top quark mass. The other three parameters: mi,{'mb)^^ - the 
bottom quark mass evaluated at m^, Qem (Mz)^^^ and as{Mz)^'^^ ~ respectively the elec- 
tromagnetic and the strong coupling constants evaluated at the Z pole mass Mz - are all 
computed in the MS scheme. 

The set of parameters and ■0 form an 8-dimensional set m of our "basis parame- 



ters" (2.1). In terms of the basis parameters we compute a number of collider and cosmo- 
logical observables, which we call "derived variables" and which we collectively denote by 
the set ^ = (Ci, ^2, • • •)• The observables will be used to compare CMSSM predictions with 
a set of experimental data d, which is available either in the form of positive measurements 
or as limits, as discussed below. 
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SM (nuisance) 


Mean value 


Uncertainty 


rter. 


parameter 




a (exper.) 




Mt 


172.6 GeV 


1.4 GeV 






4.20 GeV 


0.07 GeV 


|25] 




0.1176 


0.002 


|25] 


l/aem(Mz)*^^ 


127.955 


0.03 





Table 1: Experimental mean /i and standard deviation a adopted for the likelihood function for 
SM (nuisance) parameters, assumed to be described by a Gaussian distribution. 



3.2 Priors, observables and data 

In order to estimate the impact of priors, we adopt two different choices of priors: 

• flat priors in all the CMSSM parameters rrai/2i '"T'O) and tan/3; 

• log priors, that are flat in logmi/2 and log mo, while for the other two CMSSM 
parameters we keep flat priors. 

As regards the ranges, in both cases we take 50 GeV < ml/2l"^o < 4 TeV, |^o| < 
7 TeV and 2 < tan [3 < 62, as before [0, O, O]. Note that the above range of mo includes 



the hyperbolic branch/focus point (FP) region |27, 25] which will play an important role 
in our discussion because it currently favored by the constraint from BR{B Xs^) [p^ ]. 

The rationale for our choice of priors is that they are distinctively different. In par- 
ticular, the log prior gives equal a priori weights to all decades for the parameter. For 
example, with a log prior there is the same a priori probability that mo be in the range 
10 GeV < mo < 100 GeV as in the range 100 GeV < mo < 1 TeV. In contrast, with a flat 
prior, the latter range of mass values has instead 10 times more a priori probability than 
the former. So the log prior expands the low-mass region and allows a much more refined 
scan in the parameter space region where finely tuned points can give a good fit to the 
data (see below). The reason why we apply different priors to mi/2 and mo only is that 
both of them play a dominant role in the determination of the masses of the superpartners 
and Higgs bosons in the CMSSM. 

Clearly a flat prior on a parameter set m does not correspond to a flat prior on some 
non-linear function of it, ^(m).The two priors are related by 



'k{!F) = 7r(m) 



dm 
d^ 



(3.3) 



Thus, in the case of non-linear dependence of J-{m) the term |dm/djr| implies that an 
uninformative (flat) prior on m may be strongly informative about (i.e., constraining) T. 
(In a multi-dimensional case, the derivative term is replaced by the determinant of the 
Jacobian for the transformation.) It follows that a flat prior on logm (i.e., the log prior) 
corresponds to choosing a prior on m of the form 7r(m) oc m^^. Therefore we expect that 
the choice of the log prior will give more statistical weight to lower values of mi/2 and mo 
than in the case of flat priors. 
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Other choices of priors are possible, and indeed might be argued to be more theoret- 
ically motivated from the point of view of penalizing finely tuned regions of parameters 
space 13, m, H]. However, one would like the final inference to be as prior independent 



as possible, and the constraints to be driven by the likelihood, rather than by theoretical 
prejudices in the prior. 

A related, although different issue is the choice of the parameters with which to define 
the model. One particularly well-known implementation of the CMSSM is one version 
of the so-called minimal supergravity model |2£] where the parameters tan /? and mz are 



replaced by /x and B. This choice of parameterization has been advocated in [|18|, |19[ as 
more "fundamental" . This is questionable in the case of the CMSSM which has originally 
been defined in ref. [20| in terms of the parameters (p.l|) as an effective theory, without 



necessarily any reference to any underlying supergravity theory. More importantly, it 
is obvious that robust physical conclusions should not strongly depend on one choice of 
parameters of the model or another. If they do, this should serve as a warning bell that 
the derived statistical implications for observable quantities, like masses and cross sections, 
are not robust, in the same way as is the case with the dependence on priors. (Note that 
the impact of the same type of priors, e.g., flat, for different choice of parameterization, 
may be very different, as implied by eq. (^).) 

For the SM parameters we assume flat priors over relatively wide ranges: 167.0 GeV < 
Mt < 178.2 GeV, 3.92 GeV <_jnb{mb)^ < 4.48 GeV, 127.835 < l/aem(Mz)*^ < 
128.075 and 0.1096 < as{MzY^^ < 0.1256. This is expected to be irrelevant for the 
outcome of the analysis since the nuisance parameters are well-constrained by the data, 
as can be seen in table ||, where for each of the SM parameters we adopt a Gaussian like- 
lihood with mean // and experimental standard deviation a. Note that, with respect to 



refs. [11, |T^, we have updated the value of Mj. 

The experimental values of the collider and cosmological observables that we apply (our 
derived variables) are listed in table |2|, with updates relative to |12] where applicable. In 



our treatment of the radiative corrections to the electroweak observables Myi/ and sin^ ^gfj, 
starting from ref. we include full two-loop and known higher order SM corrections as 
computed in ref. |37|, as well as gluonic two-loop MSSM corrections obtained in |38]. We 



further update an experimental constraint from the anomalous magnetic moment of the 
muon {g — 2)^ for which a discrepancy (denoted by 5a^^^^) between measurement and 



SM predictions (based on e'^e~ data) persists at the level of 3.2(T |31].^ We will show that 
while this constraint on its own quite strongly prefers lower values of and mx/2) this is 
in contradiction with the impact of most other observables. Once they are also included, 
this preference essentially disappears. 

As regards BR{B — > Xg^), with the central values of SM input parameters as given in 
table 1^, for the new SM prediction we obtain the value of (3.12ib0.21) x 10~^.^ We compute 



^Evaluations done by diflerent groups using e+e data give slighly different values but they all remain 
close to the value given in table ^ js^] . On the other hand, using r data leads to a much better agreement 
with experiment, Saf^^'^ = (8.9 ± 0.95) x 10"^". 

^The value of (3.15 ± 0^) x lO""* originally derived in ref. [|o[ ^ was obtained for slightly different 
values of Mt and as{Mz)'^'^ ■ Note that, in treating the error bar we have explicitly taken into account 
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Observable 



Mean value 



Uncertainties 
a (exper.) r (theor.) 



Mw 
sin^ 9es 

BR(B Xsj) X 10^ 
BR{Bu Tv) X 10^ 



80.398 GeV 
0.23153 
29.5 
3.55 

17.77 ps-i 

1.32 

0.1099 



25 MeV 
16 X 10"^ 
8.8 
0.26 

0.12 ps-i 

0.49 

0.0062 



15 MeV 
15 X 10"^ 
1.0 
0.21 

2.4 ps-i 
0.38 



Limit (95% CL) 



r (theor.) 



BR{Bs ^ 

niq 
m-g 

other sparticle masses 



< 5.8 X 10-« 

> 114.4 GeV (SM-like Higgs) 
f{mh) (see text) 

> 375 GeV 

> 289 GeV 

As in table 4 of ref. 



14% 
3 GeV 
negligible 
5% 
5% 



Table 2: Summary of the observables used in the analysis. Upper part: Observables for which a 
positive measurement has been made. (5a^^^^ = a^'^P' — denotes the discrepancy between the 
experimental value and the SM prediction of the anomalous magnetic moment of the muon (g — 2)^. 
As explained in the text, for each quantity we use a likelihood function with mean ^ and standard 
deviation s = \'(t^ + t^, where a is the experimental uncertainty and r represents our estimate 
of the theoretical uncertainty. Lower part: Observables for which only limits currently exist. The 
likelihood function is given in ref. , including in particular a smearing out of experimental errors 
and limits to include an appropriate theoretical uncertainty in the observables. mh stands for the 
light Higgs mass while = {hZ Z)mssm/ 9^ {hZ Z)sm, where g stands for the Higgs couphng to 
the Z and W gauge boson pairs. 



SUSY contribution to BR{B Xg^) following the procedure outlined in refs. |42, 43| which 
was extended in refs. p^ , ^ to the case of general flavor mixing. In addition to full leading 
order corrections, we include large tan /3-enhanced terms arising from corrections coming 
from beyond the leading order and further include (subdominant) electroweak corrections. 

The parametric uncertainty involved in the computation of BR{Bu — > ti/) comes from 
using \Vui,\ = (4.34 it 0.38) x 10^^ |32] obtained from inclusive semileptonic B decays 
through the central value of mf,(mb)*^'^. For tb we use 1.643 it 0.01 ps |32] and fb = 



0.216 ± 0.022 GeV |§, and obtain BR{Bu riy 



,SM 



1.56 ± 0.38 X 10"^. For the B,-B, 



oscillations we use the SM parametric uncertainty given by the global fit from the UTfit 



collaboration |47]. 

Regarding cosmological constraints, we use the determination of the relic abundance of 
cold DM based on the 5-year data from WMAP |34| to constrain the relic abundance ^i^/i^ 
of the lightest neutralino. In order to be conservative, we employ the constraint reported in 
table 1 of ref. [p4| (mean value), obtained using WMAP data alone. The relic abundance 



the dependence on Mt and as{Mz) 
a slight reduction of its value. 



which in our approach are treated parametrically. This has led to 
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(assuming the neutralino is the sole constituent of dark matter) is computed with high 
precision, including all resonance and coannihilation effects, through MicrOMEGAs |48|, 
adding a 10% theoretical error in order to remain conservative. Note that our estimated 
theoretical uncertainty is of the same order as the uncertainty from current cosmological 
determinations of JlcDM^^- 

We further include in our likelihood function an improved 95% CL limit on BR(Bs 
and a recent value of Bg-Bg mixing, AMb^, which has recently been precisely 
measured at the Tevatron by the CDF Collaboration IsS] . In both cases we use expressions 



from ref. |45] which include dominant large tan ^S-enhanced beyond-LO SUSY contributions 
from Higgs penguin diagrams. Unfortunately, theoretical uncertainties, especially in lattice 
evaluations of are still substantial (as reflected in table || in the estimated theoretical 
error for AMb^), which makes the impact of this precise measurement on constraining the 
CMSSM parameter space rather limited.^ 

For the quantities for which positive measurements have been made (as listed in the 
upper part of table Q), we assume a Gaussian likelihood function with a variance given 
by the sum of the theoretical and experimental variances, as motivated by eq. (3.3) in 
ref. Q. For the observables for which only lower or upper limits are available (as listed 
in the bottom part of table ||) we use a smoothed-out version of the likelihood function 
that accounts for the theoretical error in the computation of the observable, see eq. (3.5) 
and fig. 1 in ref. Q. In particular, in applying a lower mass bound from LEP-II on the 
Higgs boson we take into account its dependence on its coupling to the Z boson pairs 
C|, as described in detail in ref. When ~ 1, the LEP-II lower bound of 114.4 GeV 
(95% CL) applies. For arbitrary values of C/i! we apply the LEP-II 95% CL bounds on 
nih and uia, which we translate into the corresponding 95% CL bound in the (m/j, Ch) plane. 
We then add a conservative theoretical uncertainty T{mh) = 3 GeV, following eq. (3.5) in 
ref. We will see that employing the full likelihood function in the {mfi,Ch) plane will 
allow us to discover some regions that evade the 114.4 GeV lower bound, and which would 
not have been seen in a scan that would have simply cut off all the points below the limit. 

Finally, points that do not fulfil the conditions of radiative EWSB and/or give non- 
physical (tachyonic) solutions are discarded. 



4. Effect of priors and of different observables 

We now turn to the discussion of the effects of priors and experimental observables on the 
CMSSM parameter inference using Bayesian statistics and profile likelihood. We begin 
with some general remarks. 

The choice of a prior pdf implies a certain measure on the parameter space defined 
by m. For example, the log prior will give less a priori weight to larger values of mx/2 
and mo, thus reducing the preference for the FP region. What is most important is that 
the fiat parameter space measure imposed on the basis parameter space via the choice of 
priors does not correspond to a flat measure over the space of the observables quantities 

®0n the other hand, in the MSSM with general flavor mixing, even with the current theoretical uncer- 
tainties, the bound from AMb^ is in many cases much more constraining than from other rare processes [0. 
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Shortcut 


Observables included in data set 


PHYS 

NUIS 

COLL 

UJJiVl 

BSC 

GM2 

EWO 

BPHYS 

ALL 


Physicality constraints (no tachyons, EWSB, neutralino LSP) 

Mt,mt{m,)'^,as{Mz)^,l/aen.{Mz)'^ 
rufi and sparticle masses (limits) 

BR(B ^ Xsj) 

sin2(9eff, Mw 

AMb,,BR(Bs ^ BR(Bu ^ Tu) 
All of the above 



Table 3: Shortcuts for different data combinations applied in the analysis. The actual data 
employed in the numerical analysis are given in tables ^ and |^. 

since these are in general a strongly non-linear function of the chosen set of model's 
parameters. Conversely, comparing observables quantities with experimental data leads to 
rather complicated implications for the basis parameters. 

If the data are constraining enough, the effect of the likelihood dominates over that of 
the prior and one expects the prior dependence to be negligible in the final inference (based 
on the posterior pdf). Below we examine to what extent this is the case in the CMSSM. 
We note that the CMSSM is one of the most economical phenomenological models on the 
table - more complex models (with more free parameters) are qualitatively expected to 
compound the problem, given that, as we will show below, current constraints are not 
sufficiently strong to allow drawing prior-independent conclusions. 

As regards experimental observables, since we will be interested in comparing the 
constraining power of different combinations of data, it is convenient to use shortcuts to 
designate them in shorthand. Those are given in table |. 

4.1 Impact of priors 

In this subsection we explore the impact of the flat and the log priors on the CMSSM 
parameters and on the predictions for the observable quantities. To set the stage, we per- 
form a scan of the basis parameter space without imposing any experimental constraints 
at all, i.e., we take a constant likelihood function. We only discard points suffering from 
unphysicalities: no self-consistent solutions to the RGEs, no EWSB and tachyonic states. 
Furthermore, we require the neutralino to be the LSP in order to be the dark matter. 
Therefore the final list of samples only contains physical points in parameter space. With- 
out the physicality constraint, we would have expected that such a scan would return a 
posterior identical to the prior, i.e., flat in the variables over which a flat prior has been 
imposed. 

In fig. |l] we present the implication for ID distributions of the posterior (dashed blue) 
and the profile likelihood (solid red) for the CMSSM parameters with only the physicality 
constraint imposed (PHYS). In the four leftmost panels we assume flat priors while in the 
four rightmost panels we assume log priors. (For all the SM nuisance parameters both 
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Figure 1: A scan including no experimental data, but only the requirement of physicality (PHYS). 
Two columns of panels on the left: ID posterior distribution (dashed blue) and ID profile likelihood 
(solid red) for the CMSSM parameters for the flat priors case. Two columns of panels on the right: 
the same quantities but for the log priors case. The plots reflect the prior distributions alone of the 
CMSSM parameters and the physicality constraints. 



distributions are basically flat over the prior range of the SM parameters, and we do not 
show them here.) Notice that the lack of samples in certain regions of parameter space, 
as induced by the physicality constraints, shows up in the posterior pdf as a reduction 
of the marginalised probability for that region. Thus for the flat priors case, the drop at 
low rriQ and large is primarily caused by the fact that in that region the LSP is the 
stau and hence our assumed requirements for physical points are not met. On the other 
hand, a gradual decrease in the posterior of tan /3 is a reflection of increasing difficulty 
for the RGEs to find self-consistent solutions. Eventually, at large tan /3 over about 62, 
the Yukawa coupling of the top quark grows to non-perturbative values before the GUT 
scale is reached and no solutions are found anymore, as was explained in Q. For the 
log priors case, the increased a priori probability for small values of mo compensates the 
above effects, while the large mo region is now suppressed. The same trend is even more 
evident for mi/2, where the marginal posterior pdf follows closely the expected dependence 
oc l/mi/2 characteristic of a log prior. In contrast, the profile likelihood remains flat across 
all the CMSSM parameters. This is precisely what one would have expected since no data 
have been employed. 

The above points can be confirmed by looking at the corresponding 2D distributions, 
which are shown in fig. ^. There we plot samples drawn with uniform weight from the prior 
(once the physicality constraints have been imposed), hence the density of samples reflect 
the prior pdf. 

It is interesting to consider the implied distribution for the observable quantities. This 
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Figure 2: A scan including no experimental data, but only the requirement of physicality (PHYS), 
for flat priors (panels in the left two columns) and log priors (panels in the right two columns). 
Samples are drawn with equal weight from the prior, hence their density reflects 2D probability for 
different projections on the CMSSM parameters. 



can be understood as a predictive distribution from the priors and the physicality con- 
straints for the observables. In fig. |3| we present the ID distributions of the posterior 
(dashed blue) and the profile likelihood (solid red) for the quantities which will play the 
most important role in constraining base parameters. For comparison, for each observable 
we also display the likelihood function (dotted black), which however has not been imposed 
in this scan. The two left (right) columns are for the flat (log) prior. 

Starting from the CDM abundance, we note that, in the absence of constraints from 
the data, for both choices of priors, the neutralino relic density is typically much larger 
than unity, as is well known. When we later impose the WMAP constraint (see below), 
we will therefore expect that the posterior will be dominated by the likelihood, since the 
prior is much wider (by orders of magnitude) than the likelihood. We also note that, in 
contrast, the profile likelihood remains flat out to much larger values — a reflection of the 
fact that the Bayesian posterior is suppressed because only a small number of samples is 
found with an extremely large relic abundance (O-^/i^ ^ 100). 

On the other hand, the posterior for 5aSUSY 

is very strongly peaked around zero. This 
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Figure 3: A scan including no experimental data, but only the requirement of physicality (PHYS). 
The posterior probability distribution (dashed blue) and the profile likelihood (solid red) for the 
most constraining observables (with flat priors on the left, and log priors on the right): the DM 
relic abundance Q.^h'^ of the neutralino, the excess in the anomalous magnetic moment of the muon 
(5a^^^^, the BR{B Xsj) and the lightest Higgs mass to/,.. For comparison, the dotted black, 
smooth curves give the likelihood function for the plotted observable (not imposed in this scan). 
For the DM abundance, the likelihood function plotted shows only the experimental error (i.e., it 
does not include the theoretical error employed in the scan). 



is a consequence of the overwhelming number of samples in the FP region, where the large 
superpartner masses lead to a strong suppression in the SUSY contribution to c^a^^^^. 
Even the log prior can only give a slight extra weight to the pdf for larger values of (^a^^^^. 
Again, the profile likelihood is unaffected by the choice of priors. 

Similar reasoning can also explain the fairly strong peak in the posterior for BR[B — > 
Xs7) at ~ 3 X 10"'', below the SM central value. This is the result of the negative (for 
/i > 0) chargino/stop contribution often overriding the always positive charged Higgs/top 
contribution. Finally, a large concentration of samples at large ?n.^/2 ^'^^ '^o also accounts 
for the fairly strongly peaked distribution in the pdf of the lightest Higgs mass rrih- In 
contrast, the profile likelihood is not affected by such volume effects, and remains flat, 
except for small dip at m/j ~ 88 GeV, well below the LEP limit (where the scan has not 
found any point satisfying the physicality constraints) . This is likely to be the consequence 
of the finite number of samples we could gather. 

In fig. ^ we plot the predictive distribution from the prior for the EW precision ob- 
servables and 6~physics quantities. Notice how for both choices of priors the marginal pdf 
implied by the prior (dashed blue) is typically much more strongly peaked than the likeli- 
hood function (dotted black). This means that the constraining power of the data for these 
quantities is expected to be smaller than the information already implied by the prior (see 
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Figure 4: As in fig. ^, but for some otlier observables. No experimental constraints have been 
imposed but only the requirement of physicality (PHYS) for both flat priors (left panels) and log 
priors (right panels). We plot the posterior probability distribution (dashed blue) and the profile 
likelihood (solid red). For comparison, the dotted black, smooth curves give the likelihood function 
for the plotted observable (not imposed in this scan). The range of the profile likelihood (solid red 
line) gives the range of values for the quantities covered by the scan, as a consequence of the priors 
presented in section ^. 



section |5.2| for more details). Therefore, as we shall explicitely show below, the impact of 
including them in the likelihood will be fairly limited. 

To summarize, the key point is that, as we have emphasized at the beginning of this 
section, in the CMSSM (and, more generally, in a class of effective SUSY models where 
input parameters are defined at some high scale) , the connection between the basis param- 
eters and the observable quantities (other than the nuisance parameters, which obviously 
are directly constrained) is highly non-linear. Therefore the data, although constraining 
fairly strongly some of the observables, can only give indirect constraints on the parame- 
ters of the model. This is because one can move them around in order to satisfy a given 
constraint. Therefore plotting the posterior for the obervables in the absence of data gives 
the amount by which the prior measure impacts on the observable quantities. Another way 
of interpreting the above behavior is as the prior-predictive distribution for the observable 
quantities, i.e., the probability distribution for the observables implied by the choice of 
priors. 

4.2 Impact of collider data, CDM abundance, b — > s'y and Sa^^^ 

We now move on to adding the other constraint sets from table ^ and investigate how 
they influence the conclusions obtained above for the two statistical measures and for our 
choices of priors. 
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First, in fig. |5| we show the CMSSM parameters (as in fig. [T|) but now with data 
on SM nuisance parameters, cohider hmits on Higgs and superpartner masses and the 
WMAP5 CDM abundance determination added to the Ukehhood (PHYS+NUIS+COLL+CDM). 
Corresponding 2D posterior pdf and profile Ukehhood for some of the CMSSM variable 
combinations are shown in fig. 0. 
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Figure 5: As in fig. [|, but now adding the constraint on SM nuisance parameters, col- 
lider limits on Higgs and superpartner masses and the WMAP5 CDM abundance determination 
(PHYS+NUIS+COLL+CDM), for flat/log priors (panels in the two left/right columns). The vertical, thin 
line is the posterior mean, the red cross the best-fit point. The horizontal bars on the top express 
in a graphical way the constraints on the parameters: the top bar gives 68% (green) and 95% (red) 
limits from the profile likelihood, while the bar below it gives 68% (green) and 95% (blue) intervals 
from the marginal pdf. 



By examining both figures, it is clear that the resulting constraints on the CMSSM 
parameters depend very much on the chosen statistical measure. For example, while in 
the log prior case the posterior pdf shows a stronger preference smaller mo than with the 
flat prior (and a strong peak at small mo), the profile likelihood remains essentially flat 
across all CMSSM parameters for both choices of priors. This is an indication that the 
data employed are not providing sufficient constraints on the parameters. More generally, 
we can see that the profile likelihood gives more conservative limits than the posterior pdf. 
These features can also be seen in fig. ^ (2D distributions). The 95% contours are broadly 
similar for both statistics for a given choice of prior, but are quite different for the two 
different priors. In general, the log prior favors more strongly the low energy region. We 
have also found that the chi-square of the best fit point (indicated by a cross) is lower 
for the log prior scan than the flat prior scan. There are also evident differences between 
the location of the best fit point and the posterior mean (indicated by a filled dot). This 
results from the fact the the posterior mean is infiuenced by the posterior distribution and 
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its associated volume whose distribution depends fairly strongly on the chosen prior. 

On the other hand, the nuisance parameters are already at this point extremely well 
constrained by the Gaussian likelihood, for both the Bayesian pdf and the profile likelihood 
statistics. The two statistics are almost identical for those variables and equal to the 
experimental likelihood, hence we do not show them here. 
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Figure 6: Posterior pdf (left two columns) and profile likelihood (right two columns) for flat priors 
(top row) and log priors (bottom row) for a scan including SM nuisance parameters constraints, 
collider limits on Higgs and superpartner masses and the WMAP5 CDM abundance determination 
(PHYS+NUIS+COLL+CDM). The inner and outer contours enclose respective 68% and 95% joint regions 
for both statistics. The posterior pdf has been smoothed with a Gaussian kernel of 1 bin width for 
display purposes. The cross gives the best- fit point, the filled circle is the posterior mean. 

Next we add the BR(B Xsj) constraint (PHYS+NUIS+COLL+CDM+BSG) in figs, g (ID 
distribution) and |8| (2D distribution). This has the effect of moving the region preferred 
by the profile likelihood towards large rriQ (the FP region), for both the fiat and, to a lesser 
extent, log prior.'' However, the posterior pdf still suffers from a strong prior dependence, 
with the fiat prior clearly giving more weight to larger ttt-q, while the log prior case strongly 
preferring lower mg and, to a lesser extent, mi/2, a refiection of the larger a priori proba- 
bility given to lower ranges of both parameters. Constraints on tan/J are also dependent 
on the prior and the choice of the statistical measure. 

In order to examine the impact of the anomalous magnetic moment of the muon, in 
figs. ^ (ID distribution) and|l^ (2D distribution) we replace the constraint from BR{B — > 

^The reason why the BR{B Xsj) constraint favors the FP can be seen as follows. Starting from the 
SM central value of 3.12 x 10"*, the always positive charged Higgs/top contribution has to be large enough 
so that, when combined with the negative (for /i > 0) chargino/stop contribution the total ends up around 
the experimental central value of 3.55 x 10~^. This requires the charged Higgs to be light enough and 
also the stop (or chargino) to be heavy enough. Both conditions are satisfied in the FP region. Of course 
the above argument is somewhat oversimplified, as it does not take into account the associated error bars 
on the above values but it does explain the basic mechanism, which remains dominant in a full numerical 
analysis |12[. 
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Figure 7: As in fig. ^, but with an additional constraint from BR{B Xs^) 
(PHYS+NUIS+COLL+CDM+BSG) . 




Figure 8: As in fig. 
(PHYS+NUIS+COLL+CDM+BSG) . 



but with an additional constraint from BR{B 



Xsl) 



Xs-f) with (Ja^USY (PHYS+NUIS+C0LL+CDM+GM2). This has the effect of moving, for both 
statistical measures, the prefered regions to lower masses, mo,rra^/2 ^ 1 TeV. While there 
is some residual prior dependence in the posterior pdf, the profile likelihood is now almost 
independent of the prior and the constraints on all parameters are largely reconciled for 
both statistics and prior measures. This means that, in the absence of the constraint from 
BR{B — > Xs^), the constraining power of the 5a^^^^ observable is rather strong. 

However, such a strong constraint comes at the price of a tension with other observables 
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which have not been included in this scan, especially BR{B — > Xg^). This is shown in 



fig. 11 for the log prior (the case of the flat prior is qualitatively similar). As before, the 
posterior pdf is shown in dashed blue, the profile likelihood in solid red and the likelihood 
(data) in dotted black. The DM abundance and the Ja^^^^ are well constrained and both 
statistics are in agreement with the likelihood. But both the posterior and the profile 
likelihood for BR{B — > ^^7) peak at a very low value, well below the SM value, reflecting 
a sizeable negative contribution of SUSY corrections. This is in strong diagreement with 
the observed likelihood. The other two 6-physics observables exhibit a similar tension, as 
well. Hence we expect that, once BR{B — > ^^7) and the other constraints are applied 
both the pdf and the profile likelihood will shift considerably and the (^a^^^^ constraint 
will produce a tension with the other data.^ We will discuss the tension between (5a^^^^ 
and the other observables in more detail in the next section. 
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Figure 9: As in fig. ||, but with an additional constraint from (5a^^^^, instead of BR{B Xs^) 
(PHYS+NUIS+C0LL+CDM+GM2). 



4.3 Combined impact of all observables 

Finally, we examine the combined effect of all the constraints listed in table |3| (ALL). The 



corresponding plots for the CMSSM parameters are shown in figs. 12 (ID distributions) 
and |l^ (2D distributions). In the case of the flat prior (two leftmost columns), both 
posterior pdf and profile likelihood show a clear preference for large mo and large, but 
not as much, mi/2 (the FP region), as well as a fairly narrow peak at small mo (the stau 



*An interesting oddity is the long tail of the profile likelihood for values < 114 GeV. This is caused 
by the fact that, in that case the light Higgs coupling ("^ becomes suppressed, thus evading LEP limits on 
the SM-like Higgs mass (and also corresponding to large values of BR{Bu tv), well above the observed 
value, which however has not been imposed in this scan). Note that this does not show up in the Bayesian 
pdf, because there is only a small number of samples with non-SM-like coupling. 
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t igure 10: As in fig. |, but with an additional constraint from 5a^^^ , instead of BR{B — > ^^7) 
(PHYS+NUIS+C0LL+CDM+GM2) . 
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Figure 11: As in fig. | (PHYS+NUIS+CDLL+CDM+GM2), but for several obervable quantities. Only 
the log priors case is shown here, the flat prior case is qualitatively similar. 



coannihilation region). Both statistical measures also appear to favor non-zero, positive 
Aq. On the other hand, the posterior shows a peak at large tan/3 ~ 55, although at 95% 
confidence both the posterior and especially the profile likelihood allow a wide spread of 
values, down to small values of about 10 (where the profile likelihood shows another peak), 
and even less. Turning next to the log prior (two rightmost columns), the posterior for 
niQ is now more strongly peaked at small values while the probability for larger values is 
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suppressed (again as expected from a log prior). In contrast, the profile likelihood continues 
to indicate a preference for large ttt-q > 1 TeV, in the FP region. On the other hand, the 
prefered ranges of mi/2 have for both statistical measured moved towards smaller values, 
as expected from the log prior, although the profile likelihood is qualitatively similar to the 
flat prior case. In contrast, the distributions for have not changed dramatically, while 
the bi-modality in the ones for tan/3 is somewhat stronger and showed more preference 
for lower values. We remind the reader that, for both choices of priors, we have used fiat 
distributions in both and tan 

It is clear that figs. |l^ and |l^ are qualitatively similar to figs. ^ and ^ (which show 
the impact of including BR{B — > Xg^) but not (5a^^^^), and significantly different from 
flgs. P and ^ (which show impact of including (5a^^^^ but not BR{B —>■ Xsj)). This is yet 
another reflection of the strong tension between 5a^^^^ and the other constraints, mostly 
BR{B — > Xg^), which at the end override to a large extent the impact of Ja^^^"^. 

The corresponding plots for several observables are shown in flg. It is instructive 
to compare them with the corresponding panels in flg. [l^ (where (5a^^^^ was included but 
not BR{B Xsj)) for the log prior. Again, we see a large shift in the distributions 
of (5a^^^^ (which now shows a strong peak in the posterior pdf near zero and a more 
spread-out distribution for the proflle likelihood). On the other hand, the distributions 
for BR{B Xg'j) and m/j now agree much better with the experimental data (for both 
statistical measures). The same remains broadly true also for the other obervables shown 



in fig. 14 



By examining the combined effect of all the constraints on both the CMSSM param- 
eters on the observables themselves (figs. 12, ^ and 14), we conclude that the precise 



constraints are dependent on both the statistics and on the prior choice, although broad 
trends are apparent. This means that the combined data are not yet sufficiently strong to 
completely override the prior dependence. By comparing the profile likelihood for the two 



priors, we see that it suffers much less from prior dependence. Prom fig. |lj we notice that 
both the posterior and the profile likelihood for all of the EW and 6-physics observables 
are much narrower than the likelihood, a clear sign that they are dominated by the prior 
distribution and that the effect of the data is solely to cut away the points preferred by 
(5a^ (compare with fig. 0). On the other hand, the CDM abundance, BR{B Xgj) 
and the Higgs mass limit are all in good agreement with both statistics. In contrast, the 
(5a^^^^ constraint cannot be easily fullfilled simultaneously, as shown by the fact that the 
posterior and the profile likelihood do not match with the likelihood function. 

Given the tension between 5a^,^^^ and the other observables we have also carried 
out a scan applying all observables but omitting the (5a^^^^ constraint. The results are 
qualitatively similar to the ones presented here, with the difference that the preference for 
low masses is further reduced. This further implies that indeed the (Ja^^^"^ constraint is to 
a large extent overridden by all other data preferring a different region in parameter space. 

5. Consistency and constraining power of the observables 

We now come back to examining in more detail the tension between the constraints from 
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Posterior pdf Profile lilteliiiood (Flat priors) Posterior pdf Profile lilfelihood (Log priors) 
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Figure 12: As in fig. ||, but for a scan including all the constraints listed in table ^ (ALL). 




Figure 13: As in fig. ^, but but for a scan including all the constraints listed in table || (ALL). The 
change in the numerical evaluation of the profile likelihood for scans with different priors is due to 
the change in the efficiency with which the algorithm finds good-fitting points for the two different 
choices of metric, especially for small SUSY masses. 



(5a^^^^ and BR{B -^s7) which we have already emphasized above. (Compare figs. ^ 
and ^ with figs. ^ and IC, respectively.) 

5.1 Priors and a tension between 5a^^^^ and BR(B — > Xg-y) 

The tension is clearly exposed in fig. |l^ where we include all the constraints (ALL). It is 
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Figure 14: As in fig. 03 (ALL), but for the main observables. 



stronger with flat priors but remains substantial also in the case of log priors, and therefore 
stronger for the posterior piif than for the profile likelihood since the former is more strongly 
prior dependent. We notice that the best fit point (cross) depends on the choice of prior 
quite strongly, with the log prior case able to find a point that has lower value of the 



masses and hence larger SUSY contributions to 5a' 



SUSY 



On the contrary, the posterior 



mean (circle) is very similar in both cases. This is because the posterior distribution tends 
to favor regions with low (5a^^^^ once all constraints are taken into account, and even the 
change of priors can extend the 95% contour only mildly towards larger (Ja^^^'^ values. 
The influence of priors and their interaction with the (5a^^^^ and BR{B Xgj) 
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Figure 15: 2D posterior pdf (left column) and profile likelihood (right column) for (5a^ and 
BR{B — > Xs^) for the flat (upper row) and log priors (lower row) from a scan including constraints 
from all available observations (ALL). Notice that the change in the numerical evaluation of the 
profile likelihood for different priors is a consequence of the implicit change of metric in which the 
scan is executed. E.g., in the region of small SUSY masses (i.e., large ^a^^^"^ values) the log prior 
scan is much more detailed and can find better fitting points in that region that might have been 
missed by the linear prior scan. 



constraints is further investigated in fig. 16, where we plot equahy weighted samples from 
the posterior pdf, hence the density of points represents probability density. The top panels 
show the probability density for (5a^^^^ vs mo, while the bottom row shows BR{B Xg^) 
vs rriQ. Red points are for the log prior case, green for the flat prior. Prom left to right, 
we change the sets of constraints being imposed. The panels in the first column on the left 
have only physicality constraints, nuisance parameters constraints, Higgs and superpartner 
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Figure 16: Distribution of samples from the posterior pdf, showing the preferred values for (Sa^ 
(top panels) and BR{B Xg^) (bottom panels) for different combinations of constraints. Since 
the samples are drawn from the pdf, their density reflects the region's probability. Green points 
are for flat priors, red for log priors. The horizontal dashed lines give the la interval preferred by 
observations, the solid line is the central value. The samples have been thinned by a factor of 20 
for visualisation purposes. 



masses limits and the CDM abundance constraint imposed. The flat priors give a fairly 
large mass to the FP region, hence the predictions are dominated by the asymptotic SM 
value, (5a^^^^ ~ while BR[B Xg^y) ~ 3.11 x 10~^. Both observational constraints 
(the horizontal dashed lines give la regions from the likelihood) prefer different values — 
hence the tension between the prior structure (and the CDM constrain) and both (5a^^^^ 
and BR(B ^ Xsj). 

Once the BR{B ^s7) constrain is further imposed (second column from the left), 
this has the effect of strongly shifting the preference towards the FP region, as pointed 
out in |12| and explained above. Notice how, as a consequence, the favored range of 
(5a^^^^ collapses even further towards zero, hence making the observed anomalous magnetic 
moment even more discrepant with the CMSSM favored range. 

In contrast, imposing the (5a^^^^ constraint instead of BR{B Xg'^) (third column 
from the left) has the effect of shifting the bulk of the probability to smaller values of mo, 
as low enough smuon and/or sneutrino masses are needed to produce a sufficiently large 
SUSY contribution to {g — 2)^. This, on the other hand, has the effect of selecting values 
of BR{B Xg'j) (which has not been imposed in this case) below the SM prediction, in 
strong disagreement with the experimental determination. 

Finally, once both the (Ja^^^^^ and the BR{B Xg'^) observations are imposed (right- 
most column), the posterior settles in a compromise region, which is in fair agreement with 
the BR{B — > Xg'y) observation but still quite discrepant with (5a^^^^. This comes about 
because the likelihood for 5a^^^^ is large in the region where the other constraints, and in 
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Constraints Data 

points 


Flat priors 

xLn (X^) DkL 


Log priors 

Xmin (X^) DkL 


PHYS+NUIS 4 


0.06 3.89 1.00 


0.02 3.88 1.00 


1 r^T^i\/r K 
+ UD1V1 

+BSG 5 
+GM2 5 


0.05 4. do 6. 22 
0.31 6.48 1.11 
0.27 11.55 1.35 


U.iO 4.0Z Z.o9 
0.10 5.48 1.21 
0.13 6.38 1.20 


1 t^f\j T 1 /^T^i\/r ^ 1 
+ UULL+UUiVl 0+ 

+COLL+BSG 5+ 
+C0LL+GM2 5+ 


0.28 4.60 6.2U 
0.99 6.82 1.11 
1.79 13.43 1.10 


nic^ ooQ 
0.15 5.04 2.98 

0.45 6.54 1.24 

0.17 9.92 1.49 


+C0LL+CDM+GM2 6+ 
+C0LL+CDM+BSG+GM2 7+ 


U.(0 /.io o.oO 
0.62 9.24 2.90 
6.27 15.83 3.48 


U.DO 1.(2 o.2y 

0.43 7.49 3.23 
4.67 14.89 3.39 


ALL but GM2 10+ 
ALL but CDM 10+ 
ALL 11+ 


3.51 9.45 3.42 
12.17 18.86 1.10 
13.51 19.29 3.38 


3.22 9.51 3.28 
4.14 18.30 1.24 
11.90 18.41 3.26 



Table 4: Best-fit chi-square, Xmin^ average chi-square over the posterior, (x^), and amount of 
information contained in the data, quantified using the KL divergence criterion {Dkl column, 



given by eq. (2.10)). The information content has been normalized to the information from priors 
alone with physicality and nuisance constraints imposed (PHYS+NUIS). The column "Data points" 
gives the number of constraints applied, where a + indicates that collider limits on the Higgs and 
superpartner masses have been applied. 



particular BR{B Xg'j) (combined with the flat prior) give a very low probability. 

Hence we conclude that the only observable favoring smaller values of mi^2 mo is 
(5a^^^^, while all the ones are either neutral or, as is the case with especially BR{B ^sT), 
favor the FP region |12]. 



5.2 Quality of fit and information content 

In the light of the different constraining power of the observables, it is interesting to in- 
vestigate summary statistics for the information content and the quality of fit including 
different combinations of data and for the two choices of priors. This is given in table |^. 
The information content is quantified using the KL divergence, which gives the information 
increase in going from the prior to the posterior, and for each prior is normalized to the 
information from priors alone with physicality and nuisance constraints imposed. 

First, looking at the quality of fit statistics (both the minimum and the average 
of the over the posterior), we notice that when the (5a^^^^ constraint is added on 
top of BR{B Xg'j), the quality of fit worsens dramatically, for both choices of priors. 
This reflects the tension between the two observables. Even when the (5a^^^^ constraint 
is applied on its own (cases +GM2 and +C0LL+GM2), the fit can only achieve a fairly poor 
average x^i with the situation being worse for the linear prior scan which gives more weight 
to the FP region, which is at odds with the (5a^^^^ experimental value. Also, the best-fit x^ 
is around 3 for both priors when we include all observables but (5a^^^^ (case ALL but GM2). 
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Such a fit has nominahy 2 degrees of freedom (dof), if we neglect the effect of imposing the 
collider limits. So a classical quality of fit test would give a x^/dof of 1.5 which is not very 
large. (Although of course one has to keep in mind that such a value is difficult to interpret 
statistically, as clearly the is not chi-square distributed here!) However, when (Ja^^^^^ is 
added (case ALL), the best-fit value becomes about three times worse, giving x^/dof > 6, 
which is clearly unacceptable. This indicate again a strong tension between Ja^^^"^ and 
the remaining observables, which do not appear to be able to be fulfilled all at the same 
time within the CMSSM. 

Second, the best-fit x^ values and the posterior x^ average are almost invariably better 
(albeit often not dramatically so) for the log prior scan. For the best-fit values, this is a 
consequence of the finer detail with which the low mass region can be explored with this 
prior, and therefore the scan is able to find better fitting points that can be more easily 
missed by the fiat prior scan. The better average values refiect the fact that the log prior 
scan finds in general better fitting points than the fiat priors one. 

Finally, the information gain with respect to both priors is dominated by the CDM 
constraint, which alone accounts for about 80% of the combined constraining power of all 
the data in the log prior case and for about 95% of the constraining power for the fiat prior 
case. This follows from taking the ratio of the .Dkl value for the case +CDM with the ALL 
case. Taken on their own, each of the BR{B Xg^) and the 5a^^^^ observables have less 
than half the constraining power of the CDM abundance (compare the -Dkl values of the 
+CDM case with either +BSG or +GM2). When added on top of CDM, they only contribute 
about an extra 10% information on the parameters at most. This is also evident from 
the case ALL but CDM, where all the constraints have been applied except for the CDM 
abundance. In this case the information content is only very mildly increased from the 
PHYS+NUIS value. 



6. Some implications for LHC and DM searches 

We now discuss some ensuing implications for prospects of experimental CMSSM tests 



at the LHC and in DM searches. We start by plotting in fig. |17| the posterior pdf for 
the gluino mass m-g and the lightest Higgs mass m/j for the flat and log priors and for 
different combinations of data. (The proflle likelihood has a broadly similar behavior and 
is not shown in the figure.) Since Trig ~ 2.7m]^/2i its posterior distributions (including only 
physicality constraints PHYS marked with dotted black; the case PHYS+NUIS+COLL+CDM with 
dashed blue; and all constraints, ALL, with solid red) refiect the respective plots of mi/2 in 
figs. H, |5| and |l2|. (Although the plot only shows the range up to 6 TeV, the pdf for PHYS 
remains approximately flat up to ~ 8 TeV.) In the case of the flat prior one can observe 
a signiflcant narrowing of the spread of rrig due to the increasing number of constraints 
applied (corresponding to increasing line thickness). The log prior instead (bottom left 
panel of fig. 0) features a shift of m-g towards lower values (< 2 TeV) almost independently 
of the constraining power of the data applied - a refiection of the log prior giving more 
weight to lower values of ^ mentioned earlier. The dependence of nig on the 

prior choice is still significant but, with the LHC reach expected to be around 2.7 — 3 TeV, 
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Gluino mass (TeV) Lightest Higgs mass (GeV) 

Figure 17: Posterior pdf for the gluino mass and the hghtest Higgs, for flat priors (top panels) and 
log priors (bottom panels) for different combinations of data. The constraints applied increase with 
increasing line thickness. Within each panel: the dotted black line has only physicality constraints 
(PHYS), the blue, dashed line has physicality constraints, SM parameters constraints, collider Higgs 
and superpartner masses limits and CDM abundance data imposed (PHYS+NUIS+COLL+CDM), the 
thickest, solid red line has all constraints applied (ALL). Though not plotted in the figure, the 
profile likelihood show a qualitatively similar behaviour. 



most of the gluino mass range will be explored even in the less optimistic case of the flat 
prior 1^, 12|. 



Turning next to the light Higgs, in the CMSSM in most cases its couplings to ZZ and 
WW closely resemble those of the SM Higgs boson with the same mass. (However, note 
some exceptions mentioned in subsection ^]^.) With both priors the posterior pdf again 
peaks more strongly and shifts to the left with an increasing number of constraints. After 
all the constraints have been applied, the posterior features a rather sharp cutoff around 
122 GeV, similarly to the result of our detailed study |]ll|. (Note also that for the log prior 
much of the Higgs mass lies below the LEP limit on the SM-like Higgs, a reflection of our 
more refined treatment of the LEP limit.) This mass range is within reach of the currently 
operating Tevatron but will actually be rather challenging for the LHC where it may take 
several years to explore it. 

Finally, we investigate the implications for direct dark matter detection experiments. 
In fig. |l^ in the plane spanned by - the spin-independent cross section for DM neu- 
tralino scattering off a proton - and the neutralino mass we plot the posterior pdf (left 
panels) and the profile likelihood (right panels) for the case of the flat (upper row) and log 
(lower row) priors. The current strongest experimental 90% CL limits from CDMS pO|| , 
XENON-10 [51| and ZEPLIN-II |52| have also been marked for comparison (athough they 
have not been imposed as constraints in the analysis). 
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Figure 18: Posterior pdf (left column) and profile likelihood (right column) for the spin- 
independent scattering cross section of the neutralino WIMP off a proton versus the neutralino 
mass, for flat priors (top row) and log priors (bottom row), for a scan including all available con- 
straints (ALL). The inner and outer contours enclose the respective 68% and 95% regions for both 
statistics. The cross gives the best-fit point, the filled circle is the posterior mean. We also plot some 
recent 90% upper limits for comparison (which, however, have not been included as constraints in 
the scan). 



Our presentation here follows our earlier studies 11, 12] where the direct detec- 

tion quantities were discussed, accounting fully for the first time for all relevant particle 
physics sources of uncertainty and marginalising over nuisance parameters. (There still 
remain hadronic uncertainties which can change CTp^ by up to a factor of ten [^3|.) It 
was shown that, with flat priors, the strong preference for the FP region leads to a rather 
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optimistic scenario for spin-independent scattering off a nucleon, as most of the posterior 
probability was found to be concentrated around £7^^ ~ 10~^ pb. 

Our updated results in fig. |l^ still show such relatively high value (and a long m^- 
dependent tail) for the posterior pdf for the flat prior. The profile likelihood follows a 
similar trend, but shows a somewhat stronger preference for large values of Up^, with the 
best-fit point around ct^^ ~ 1.7 x 10~® pb. Applying the log prior (which favors lower 
masses) reduces significantly the contribution from the FP region. The best-fit point shifts 
to a value which is about one order of magnitude below the best-fit point found with the 
flat prior scan. (However notice from table § that the quality of fit of both points is very 
similar.) Finally, we have also investigated the case where all constraints but the (5a^^^^ 
observation are applied. Although this is not shown here, this case yields very similar 
results to the case ALL plotted in fig. 

The dependence on the choice of priors remains significant, which calls for caution 
in drawing strong conclusions regarding prospects for DM searches.^ Despite this, with 
experiments aiming to reach down to 10"^" pb most of the high-probability range of 
will be covered. 

In conclusion, the current data are not yet constraining enough to allow one to reliably 
predict values of some key observables discussed here. However, even at present the pre- 
dicted spread of their values make prospects for LHC searches for gluino and light Higgs 
(the latter also at the Tevatron) and DM searches in direct detection highly encouraging. 

7. Summary and conclusions 

We have subjected current constraints for the CMSSM parameters to a detailed scrutiny 
using a state-of-the art scanning technique (MultiNest) which reduces the computational 
burden by over 2 orders of magnitude with respect to previously employed MCMC tech- 
niques. We investigated the impact of prior choices and of applying different combinations 
of constraints, both from the point of view of Bayesian statistics and using the profile 
likelihood. We have updated and applied all relevant constraints, from cosmology, collider 
limits, EW observables, b ^ sj, 6aJ^^^ and 6-physics. 

We have found that current data are not yet constraining enough to allow drawing 
statistically robust conclusions on allowed ranges for the CMSSM parameters. Conclusions 
regarding the value of tuq and tan P are particularly sensitive to the choice of priors, 
statistics and data included. We find that in general values of ?7^l/2 ^ 2 TeV are preferred, 
while for positive values are weakly favored. We have highlighted the complex interplay 
between priors, observables and statistics, which intrinsically limits the constraining power 
of the observables on the value of the CMSSM parameters. 

For this reason we feel that it is difficult to argue that one choice of parameters is 
in some sense or another superior to any other. In particular, the standard choice of 
CMSSM parameters as given by (^]^) is as good as the "fundamental" set in terms of /U 

^It was recently argued in ref. that, using a different parameterization of the CMSSM leads to even 
more optimistic detection prospects. This dependence on the choice of parameterization can be seen as 
another way of phrasing the prior dependence and therefore the same caution applies in this case. 
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and B advocated in 19|. In fact, if the choice of parameterization strongly impacts 
on the predictions for the measurable quantities (e.g., cXp^, as in ref. |jl9|), this should be 
interpreted as a case in which theoretical prejudice plays a stronger role than the constraints 
from the data. Clearly, better data are required in order to be able to constrain univocally 
(i.e., independently of the choice of priors and statistics) the parameters of the model. This 
conclusion is expected to apply more generally to more complex phenomenological models, 
with a larger number of free parameters than the CMSSM. 

Among the observables, the most constraining role is played by Q.^h? , ruh, BR{B — > 
Xc,7) and (5a^^^^. The latter (still somewhat controversial) constraint is singular in favoring 
smaller mi/2 and mo but in a numerical analysis its impact becomes outweighted by the 
other constraints, especially BR{B — > ^^7) which favors the FP region. The numerical 
measure of tension between the two constraints is prior dependent but it is clear that both 
favor different regions of the CMSSM parameter space. 

In the light of our results, some comments are in order about the conclusions ob- 
tained in our previous works H, H, Our previous findings regarding the posterior 



obtained with flat priors have been conflrmed by the present analysis obtained using a 
different scanning algorithm. In particular, the preference for the FP region brought about 
by BR{B Xgj) has been exposed here more clearly, and the tension with the (5a^^^^ 
measurement we had previously remarked has been further highlighted. As far as one is 
prepared to assume flat priors, these conclusions are therefore solid. This work has further 
investigated previous hints that current data are however not sufficiently strong to give 
conclusions that are fully independent on prior assumptions. This has allowed us to rein- 
force previous cautionary warnings on the interpretation of the posterior, which at present 
is still strongly influenced by the prior for some of the quantities. We also pointed out 
that the numerical evaluation of the proflle likelihood is not immune from the influence 
of the chosen prior measure. Regarding direct and indirect detection prospects, we found 



that our previous predictions for direct detection experiments |]13[ are robust with respect 
to changes in the prior and in the statistical measure. Although we have not addressed 
indirect detection prospects in this work (see [0], qualitatively we expect that the result 
will be dominated by residual astrophysical uncertainties (galactic halo proflle, propagation 
parameters, boost factor) rather than by the statistical issues connected with the particle 
physics aspect. Therefore we can conclude that the results of [14| qualitatively hold true. 

We have quantifled the information content of the different combination of data using 
an information-theoretical measure and have found that it is dominated (about 80% for 
log priors and about 95% for flat priors) by the constraining power of the cosmological 
dark matter abundance determination. 

Finally, despite the above uncertainties, prospects for dark matter direct detection and 
superpartner discovery at the LHC remain fairly positive 

Note added: When this work was being finalized, a paper |^] appeared which employs 
an MCMC chi-square analysis of the CMSSM and seems to be reaching rather different 
conclusions. Ref. favor the region of much lower mo < 250 GeV (at 68% CL) and it also 
claims that the determination of fi^/i^ is not very relevant in constraining the CMSSM 
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parameters. We note that, compared to 1^], the chi-square expression employed in Q 
no longer contains an extra term whose role was to suppress (somewhat artificially) the 
weight of the FP region. Also, contrary to refs ^, f^^^^ cannot be used to unambigously 
determine mg in terms of the other CMSSM parameters if one also varies SM parameters, 
e.g., Mf (compare fig. 4 in ref. Furthermore, there are some indications that the code 

used in refs |^ (FeynHiggs) to derive the light Higgs mass value might disagree with 
the results obtained using SOFTSUSY (employed here) |Q. However, without a detailed 
comparison of the numerical outputs (which we have invited the authors of to carry 
out), we are at present unable to track down conclusively the reasons for the discrepancies 
between our conclusions. 
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A. Nested Sampling and the MultiNest algorithm 




Figure 19: Cartoon illustrating (a) the posterior of a two dimensional problem; and (b) the 
transformed L(X) function where the prior volumes are associated with each likelihood L^. 
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Nested sampling [22| is a Monte Carlo technique aimed at efficient evaluation of the 
Bayesian evidence, but also produces posterior inferences as a by-product. It calculates the 
evidence by transforming the multi-dimensional evidence integral into a one-dimensional 
integral that is easy to evaluate numerically. This is accomplished by defining the prior 
volume X as dX = p{m)d^m, so that 



X{X) = / p{m)dm, (A.l) 

J L{m)>X 

where C{m) = p{d\m) is the likelihood function and the integral extends over the region(s) 
of parameter space contained within the iso-likelihood contour C{m) = A. Assuming that 
C.{X), i.e. the inverse of (A.l), is a monotonically decreasing function of X (which is 



trivially satisfied for most posteriors), the evidence integral (2.3) can then be written as 



Z = p{d) = [ £{X)dX, (A.2) 
Jo 



Thus, if one can evaluate the likelihoods Cj = C{Xj), where Xj is a sequence of decreasing 
values, 

< < • • • < X2 < Xi < Xo = 1, (A.3) 

as shown schematically in fig. the evidence can be approximated numerically using 
standard quadrature methods as a weighted sum 

M 



Z = Y^^iW,. (A.4) 



In the following we will use the simple trapezium rule, for which the weights are given by 
Wi = ^(Xj-i — Xj+i). An example of a posterior in two dimensions and its associated 
function C{X) is shown in fig. 

This technique allows to reduce the computational burden to about 10^ likelihood 
evaluations 

A.l Evidence Evaluation 



The nested sampling algorithm performs the summation (A.4) as follows. To begin, the 
iteration counter is set to z = and "live" (or "active") samples are drawn from the 
full prior p{m) (which is often simply the uniform distribution over the prior range), so 
the initial prior volume is Xq = 1. The samples are then sorted in order of their likelihood 
and the smallest (with likelihood £0) is removed from the live set and replaced by a point 
drawn from the prior subject to the constraint that the point has a likelihood C > Cq. The 
corresponding prior volume contained within this iso-likelihood contour will be a random 
variable given by Xi = tiXo, where ti follows the distribution Pr(t) = Nt'^~^ (i.e. the 
probability distribution for the largest of N samples drawn uniformly from the interval 
[0, 1]). At each subsequent iteration i, the discarding of the lowest likelihood point Ci in the 
live set, the drawing of a replacement with L > Li and the reduction of the corresponding 
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prior volume Xi = are repeated, until the entire prior volume has been traversed. 

The algorithm thus travels through nested shells of likelihood as the prior volume is reduced. 

The mean and standard deviation of logt, which dominates the geometrical explo- 
ration, are: 

E[\ogt] = -^, ^[logt] = l. (A.5) 

Since each value of \ogt is independent, after i iterations the prior volume will shrink down 
such that \ogXi « —{i it y/i)/N. Thus, one takes Xi = exp(— i/A^). 

A. 2 Stopping Criterion 

The nested sampling algorithm should be terminated on determining the evidence to some 
specified precision. One way would be to proceed until the evidence estimated at each 
replacement changes by less than a specified tolerance. This could, however, underestimate 
the evidence in (for example) cases where the posterior contains any narrow peaks close 
to its maximum. [^] provides an adequate and robust condition by determining an upper 
limit on the evidence that can be determined from the remaining set of current active 
points. By selecting the maximum-likelihood £max in the set of active points, one can 
safely assume that the largest evidence contribution that can be made by the remaining 
portion of the posterior is IS.Z^ = £max^i) i-e. the product of the remaining prior volume 
and maximum likelihood value. We choose to stop when this quantity would no longer 
change the final evidence estimate by some user-defined value (we use 0.5 in log-evidence). 

A. 3 Posterior Inferences 

Once the evidence Z is found, posterior inferences can be easily generated using the full 
sequence of discarded points from the nested sampling process, i.e. the points with the 
lowest likelihood value at each iteration i of the algorithm. Each such point is simply 
assigned the probability weight 




(A.6) 



These samples can then be used to calculate inferences of posterior parameters such as 
means, standard deviations, covariances and so on, or to construct marginalised posterior 
distributions. 



A. 4 Ellipsoidal Nested Sampling 

The most challenging task in implementing the nested sampling algorithm is drawing sam- 
ples from the prior within the hard constraint C > Ci aX, each iteration i. Employing a 
naive approach that draws blindly from the prior would result in a steady decrease in the 
acceptance rate of new samples with decreasing prior volume (and increasing likelihood). 



Ellipsoidal nested sampling |55] tries to overcome the above problem by approximat- 
ing the iso-likelihood contour of the point to be replaced by an L>-dimensional ellipsoid 
determined from the covariance matrix of the current set of live points. New points are 
then selected from the prior within this (enlarged) ellipsoidal bound until one is obtained 
that has a likelihood exceeding that of the discarded lowest-likelihood point. In the limit 
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that the elhpsoid coincides with the true iso-hkelihood contour, the acceptance rate tends 
to unity. 



A. 5 MultiNest Algorithm 




Figure 20: Cartoon of ellipsoidal nested sampling from a simple bimodal distribution. In the 
top left-hand panel, we see that the ellipsoid represents a good bound to the active region. Going 
towards the r.h.s., as we nest inward we can see that the acceptance rate will rapidly decrease as the 
bound steadily worsens. The final picture underneath illustrates the increase in efficiency obtained 
by sampling from each clustered region separately. 

Ellipsoidal nested sampling as described above is efficient for simple uni-modal poste- 
rior distributions without pronounced degeneracies, but is not well suited to multi-modal 



distributions. As advocated by [^] and shown in fig. 20, the sampling efficiency can be 
substantially improved by identifying distinct clusters of live points that are well separated 
and constructing an individual ellipsoid for each cluster. In some problems, however, some 
modes of the posterior might possess a pronounced curving degeneracy so that it more 
closely resembles a (multi-dimensional) 'banana'. Such features are problematic for all 
sampling methods, including the above mentioned clustered ellipsoidal sampling technique 
of |^6|. To sample with maximum efficiency from such distributions, MultiNest algorithm 
divides the live point set into sub-clusters which are then enclosed in ellipsoids and a new 
point is then drawn uniformly from the region enclosed by these 'overlapping' ellipsoids. 
The no. of points in an individual sub-cluster and the total no. of sub-clusters is decided 
by a an 'expectation-maximization' algorithm so that the total sampling volume, which is 
equal to the sum of volumes of the ellipsoids enclosing the sub-clusters, is minimized. This 
allows maximum flexibility and efficiency by breaking up a mode resembling a Gaussian 
into relatively fewer no. of sub-clusters, and if the posterior mode possesses a pronounced 
curving degeneracy so that it more closely resembles a (multi-dimensional) 'banana' then 
it is broken into a relatively large no. of small 'overlapping' ellipsoids. The essence of this 



modification is illustrated in fig. 21 



The progress of the MultiNest algorithm is controlled by two main parameters: (i) the 
number of live points A^; (ii) the maximum efficiency /. These values can be chosen quite 
easily as outlined below. First, N should be large enough that, in the initial sampling 
from the full prior space, there is a high probability that at least one point lies in the 
'basin of attraction' of each mode of the posterior. In later iterations, live points will 
then tend to populate these modes. It should be remembered, of course, that N must 
always exceed the dimensionality D of the parameter space. Also, in order to calculate the 
evidence accurately, N should be sufficiently higher so that all the regions of the parameter 
space are sampled adequately. The parameter / controls the sampling volume Vi at the i^^ 
iteration, which is equal to the sum of the volumes of the ellipoids enclosing the live point 
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Figure 21: Cartoon of the sub-clustering approach used to deal with degeneracies. The true 
iso-likelihood contour contains the shaded region. The large enclosing ellipse is typical of that 
constructed using our basic method, whereas sub-clustering produces the set of small ellipses. 



set, such that: 

Vi > XJ (A.7) 

where Xj is the prior volume at the i*^ iteration of MultiNest algorithm and Vi > Xif in 
the case when at the i*^ iteration, no set of ellipsoids enclosing the live points can be 
found such that the sum of their volumes, Vi, is smaller than the prior volume, Xi. 

For all the models analysed in this paper, we used 4, 000 live points with maximum 
efficiency / set to 1. This corresponds to around 500,000 likelihood evaluations taking 
approximately 48 hours on 4 3.0GHz Intel Woodcrest processors. 
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